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ABSTBACT 

Hultivariable queries can be processed in the data 
base management system INGRES. The general procedure is to decompose' 
the query into a sequence of one-variable queries using two 
processes. One process is reduction which requires breaking off 
components of the query which are joined to it by a single variable. 
The other process, tuple-substitution^ involves substituting fot one 
of the variables a tuple at a time. The query processing algorithm 
has been developed for QOEL, the data language for INGRES. Algorithms 
for reduction and for choosing the variable to be substituted are 
given. The decision about which variable to substitute depends on 
estimation of c6sts, and some procedures for making cost estimates 
are outlined. (Author/CH) 



I 




i^iliiliiliiliili'ifiiliiliilliilLilLiliiliilLiliiliiliiliiliili^^ 

* Documents acquired by ERIC include many informal unpubl 

* materials not available from other sources. .ERIC makes ever 

* to obtain the best copy available. Nevertheless, items of m 

♦ reproducibility are often encountered and this affects the 

♦ of the microfiche and hardcopy reproductions ERIC makes ava 

* via the ERIC Document Reproduction Service (EDRS) . EDRS is 

♦ responsible for the quality of the original document. Repro 

♦ supplied by EDRS are the best that can be made from the ori 



ished * 
y effort * 
arginal * 
quality * 
ilable * 
not * 
ductions * 
ginal. * 



c 



DECOMPOSITION-A STRATEGY FOR QUERY PROCESSING 

by 

Eugene Wong and Karel Youssefi 



us OerAHTMEHTOFMCALTH 
fOUCATlONftW«LFARC 
NATIONAL IHSTITUTCOF 
tOUCATlON 

OOCUMENT HAS teCN REPRO- 

Duceo EXACTLY AS REce.veo from 

MrNC .T POINTS OF View OR 0''JH'ON^ 
STATtEOOO NOT NECESSARILY RePRC- 
SENT OFFICIAL NATIONAL INSTITUTE Of 
EDUCATION POSITION O" POLICY 



Memorandxjm No. ERL-M574 
15 January 1976 



ELECTRONICS RESEARCH LABORATORY 



College of Engineering 
;t;sity of California, Berkeley 
94720 • 



Decompositlon ^jy Strategy for Qu^y ProcesslnR . 

Eugene Wong and Karel Yo^Jsefi 

Dept. of Electrical Engineering and |;onputer Sciences 
and Electronics Research Laboratory, 
University of California, ^i&rkeley 



Abstract. 
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raultivariable queries in the data 
INGRES* The general procedure is 
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between (a) reduction: breaking o 
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U Introduction 

The structural simplicity of a relational data 
moxlel encourages the use of a non-procedural data sub- 
^linguage ^which specifies what is to be found rather than 
/tiow it is to be found. Thus, it is not surprising* that 
nearly every one of the relational languages whitfh have 
been proposed is non-procedural* As is generally true 
with high level languages, a price which may have to be 
paid is a loss of efficiency* For a relational data base 
of any size and for queries spanning several relations, 
the price can be fearsome. Results of various degrees df 
generality on improving search strategies for a relation- 
al data base system have been reported by Palermo 
[PALE72] , Astrahan and Chamberlin [ASTR75 3, Rothnie 
[R0TH7^,ROTH75] , Pecherer [PECH75],- Smith and Chang 
[SMIT75], and Todd [TODD75]. Nonetheless, the lack of a 
general approach to optimizing query processing remains a 
major impediment to achieving a satisfactory degree of 
efficiency for non-procedural relational languages. 

The purpose of this paper is to describe in some 
detail the query processing algorithm developed for QUEL 
^ [HELD75], which is the data language for the INGRES sys- 
tem. Insofar as the problems encountered in QUEL are 
common to all non-procedura^l relational languages, * their 
solution should find general application. * 

In section 2 a brief description of QUEL, the 
query language to be processed, is presented. In section 
3 we sketch a skeletal outline of the decomposition algo- 
rithm emphasizing the functions of the component algo- 
rithms and the flow of information and control amorw 
them. The details of the component algorithms ar\ 
presented in subsequent sections. 
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2. QUEL 

A complete definition , of QUEL is 
[HELD75]. Here, we sfiall confine ourselves 
description sufficient to make the processing 
comprehensible. There are* four commands: 
REPLACE . DELETE . APPEND , Ah update command 
into a RETRIEVE command whitfh is then followe 
level tuple-by-tuple operation,' We shall res 
attention to RETRIEVE ^ A, statement to retrie 
has the following form. 



given in 
to a brief 
strategy 
RETRIEVE , 
is turned' 
d by a lew 
trict our 
ve in QUEL 



RANGE OF (Variable [, Variable]) IS 

(Relation Name {, Relation Name]) 

RETRIEVE [INTO result name] (Target' List) 

WHERE Qualification " 

Example 2. J.: 

Consider a data base with relations 



Supplier (S#, Snarae, City) 
Parts (P#, Pnarae, Size) 
Supply (S#, P#, Quantity) 
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and a query to find the names of all parts supplied' by 
suppliers - in New York. This can be stated in QUEL as 
follows : 



RANGE OF ( S, P, Y, ) IS ( Supplie>>^ar ts , Supply ) 

RETRIEVE INTO NYparts (P. Pnarae) WHERE (^#= Y.P#) 

AND (Y.S#=S.S#) • ' 
and' (S.Citys'New York') 

From the poirtt^ of view of query processing "inhere 
are two principal sources of complexity. First, qItEL 
permits aggregation operators such as MAX and AVG, anch 
nesting of such operators. Secondly, queries involving 
several variables require deft handling in order to avoid 
the obvious possibility of combinatorial growth. For ex- 
ample, if the query in Example- 2.1 is processed by first 
.forming a cartesian product, then the number of tuples to 
be scanned is equal to the product of the ^cardinalities 
of the three relations, in our system all aggreg^ations 
are performed on single relations. If an aggregation is 
to be done on a subset of the product </f several rela- 
tions, the subset must first be assemble^' by proces^slng a 
raultivariabe query. Aggregations once evaluated are kept 
for possible reuse until updates rendej^ them obsO/lete, 
In the remainder of the paper we sHall deal onTy with 
aggregation-free queries, and the thrust of the query- 
processing strategy is to oopp effectively with 




aggregation-free but multivariable queries* 

Let X = (X., Xp,...,X ) denote the variables declared in 
the range ^ statement, and let Rj, Rp'^^^'^n ^heir 
respective ranges. Then the qualification can be ^con- 
sidered to be a Boolean function B(X) on the cartesian 
product R z R X R^x.^.xR • The target list can be con- 
sidered to Be a^ set of functions (T-(X) , 
Tp(X),..*,T (X))=:T(X) on the product space, and ' the 
result relation of the query is constructed by evaluating 
*T(X) on the subset of R defined by B(X) = 1, and elim- 
inating duplicate tuples. We note that for a query free 
of aggregation operators each tuple X in the product 
space R contains enough information to completely deter- 
mine the values of B(X) and T(X). 

I 

The interpretation of*^ QUEL statements suggests\ 
the following procedure for their processing: 

(a) Product: A cartesian product of the range 
relation is formed. . , 

(b) Bestriction: TupleS X in the product wliich 
satisfy B(X)=1 are determined. 

Cc) Computation and Projection: T(X) is comput- 
ed on the subset determined in (b) and dupli- 
.cate tuples are eliminated. 

Unfortunately, this procedure is inefficient as 
it is obvious. The cardinality of the product R 
(i.e.-, tne number of tuples in R) is equal to the 
product of tne cardinalities of.R., i=1,2,...,n. It 
does not take very large relations or very* many of 
them to make this number enormous. Aside from the 
difficulty of having to form and store a very- large 
relation, to determine the subset which satisfies 
B(X)=1 requires examining a number pf tuples equal 
to the cardinality of R. 




3« Deconposltlon 



The query processing stratef^y that we have 
adopted nas two overall objectives: 

(a) No cartesian product -^Tfte result re- 
lation is to be constructed by assembling 
comparatively small pieces, rather than by 
paring down the cartesian product^ 

(b) No geometric growth - The nurabeYv of 
tuples to be scanned is to be kept a^ 
small as possible, and for most queriesx 
this number is much less than the cardi- \. 
nality of R* ^ X 

Our general procedure is^ to reduce an arbitrary mul- 
tivariable query to a* sequence of single-variable 
ones. We cail this process decomposition > Observe 
that the f ir st. obj^ctive is automatically achieved 
by such an -approach. To attain the second requires 
a detailed examination of the tactical moves which 
are available. 

The decisl^i to reduce multivariable queries 
to one-variable onebxseparates the overall optimiza- 
tion into two levels, xlt has obviou^ advantages in 
structuring^ the optimisation procedure which other- 
wise may well become unbearably complex. The only 
situation in which our approach may be undesirable 
is when , inter-relational information such as "links" 
[TSIC753 is availabl^^pLn which case the desirable 
atomic units may be two-variable queries. 

It is useful to distinguish two types pf 
operations which are repeatedly invoked in decompo- 
sition. 

(I) Tuple substitution: ~Ait^ "ri*-variable 
query Q is replaced by — a family of 

' (n- 1 )-variable queries resulting* f rom sub- 
^ stituting for one of its variables tuple 
, by tuple, i.e. , 

(II) Detachment of a subquery with a sin- 
gle overlapping variable : A query Q is 
replaced by Q' followed by Q" such that Q' 
and Q" iiave only a sin^-le variable in 
common. 

Operations of these two types suffice to 
decompose any query completely. Indeed, a series of 
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successive tuple substitutions is sufficient, albeit 
tantamount to ^fk^rminr; the cartesian product. Tuple 
substitution Tp^r a single variable means that the 
cost of prp^essinn the remaininr^ portion of the 
query is m>d:tipliecl 'by a factor which in most cases 
is equal ^ to the cardinality of the range 0:f the sub- 
stitut^d" vdria5>le* It is important, therefore, that 
tne r^anges of the variables be reduced as nuch as 
PQi^^^ible before substitution takes place* The most 
st/aight for ward way of doing this is through res- 
triction apd projection^, which are special cases of 
(II)* Sdmethlng equivalent to such a step has been 
proposed every paper on optinizin^^ query process- 
inp; • 

Example .S.I 

Consider a data base with t'hr^e relations 

'\ Supplier (5//, Sname , City) 
Parts (P#, Pname, Size) 
Supply (S#, P#, Quantity) 

and a query Q; 

RANGE OF (J,P,Y) IS {Supplier, Parts, Supply) 

, RETRIEVE (S. Sname) WHERE (S.City = M^ew York') 

AND (P. Pname = 'Bolt') 

AND (P. Size = 20) 

AND (Y.S# = S.Sif) 

AND (Y.P# = P.P#) 

AND (Y.QuantitJy > 200) 



If we represent a detachment of Q' from Q 
oy tne binary tree 



leaving Q" 




/ , Q ' 

then tn^successive detachment of subqueries can be 
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represen'^ed by 



Q 



01 
02 
3 
Q4 



(P.P#) WHERE (P.Size = 20) AND (P . Pname = ' Bolt ' ) 

<Y.P//, Y.S#) WHERE (Y.Ouantity >\200) 

,(S^S#, S.Sname) WHERE (S.City = York') 

(Y.S#) UHERE (Y.P# = P.P//) 



Q5 



(S.SMane) WHERE (Y.S// = S.S#) 



In tnis example operations of type II have neduced 0 
to three one-variable queries Q1 , 02, Q3 waioh can 
be processed in parallel or in arbitrary order/\fol- 
^ed by a 2-variable query , and then another 
2-vxriable query Q5. 0^ and 05 cannot be furtheX 
reduced by operations of type II , and t uple- 
subst irbition must be used to complete the decomposi- 
tion. UeXnote, however, the ran.r^es of the variables 
in Q4 and QS. are likely to be very nuch smaller than 
tne orininaiX relations, and tuple substitution at 
these stages is relatively harmless. As an example 
of tuple substitution, consider 



Q5 



RETRIEVE (S.Snane) WHERE (Y.S#=S.S#) 



Suppose that at this point the range of Y is the re- 
lation 




Tnen , successive substitution for Y yields 



05( 101) 
05( 107) 
Q5(203) 



RETRIEVE (S.Sr.ane) HHERE (S.S#=101) 
RETRIEVE (S.Snane) V/HERE^ (S.S#=107) 
RETRIEVE (S.Sname) V/HERE (S.S# = 203) 



We note that unlike SEQUEL [ASTR75], QUEL has ^ no 
block structure and there is no a priori preferen- 
tial order of variables In substitution. 

, The general situation covered by (II) is the 

following: Consider a query of the form 
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RANGE OF (X^,X2 X^,) IS (R^jRg f\) 

Q RETRIEVE T(X . , . . . , X„) 

k I 2 In 

WHEltE B"(X^,X2, ... ,Xjjj) 

AND B'(X_,X„^.,...,X^) 

It is natural to break off B' to form 

RANGE OF (X„,X„^., ,X^) IS (R„,R„ .,---,R,) 
ra ^ ra+ V ' n ra ' ra+ • ' ' n 

Q' RETRIEVE INTO R ' (T'(X )) 



m 



ra 



WHER 



^ ^ ( X^ f 4 I • • • f ) 
ra ' m+ r ' n 



where T'(X ) contains the information on X needed 
by the remainder of the query which can nSw be ex- 
pressed as 

\ RANGE OF (X^,X2,*-*,X^) IS (R^ i * ' * ' ^ra'^ 

\ Q" RETRIEVE T(X.,X^, • ,X ) 

WHERE B"(X^,X2,--*,X^) 

Observations ; (1) Q" is necessarily 
simpler than the original query Q since m < n and 
R' is smaller than R • Even for the worst possible 
case where R' = rasn, Q" Is no worse than Q* 

(2) The detacfiraent of Q' does not lead to an In- 
crease in the raaxiraura nuraber of variables for which 
substitution has to be raade. To see this, note that 
the maximura number of variables to be substituted 
for in an n-variable query is n->. Hence, this 
number is (n-m+D-l for Q' and m-1 for so that 
the total is again n-1* (3) Q' and Q" are strictly 
orc^^ed, Q' needs no information- from Q" so that It 
can be processed completely before processing on Q" 
begins* At any given time we only need to deal with 
a total of n or less variables ••^ 
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Two special cases of one overlapping- 
variable subqueries are worthy o^ special note. 
First, it may happen that the detached subquery Q' 
has no variable in coraraon with the i^raainder Q^. 
That is, B' is a function of only (X ,X ) and 

not of X . In s\;ch a case we snJll say w is a 
disjoint .subquery* T^e Interpretation of this si- 
tuation is that if "Is satisfied by a nonerapty set 
then Q is equivalent to Q", otherwise Q is Itself 
void, i.e., Its result i^s empty* The second special 
case arises when ra^n and\B' is a one-variable query* 
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This is a frequent and important occurence, as the 
previous example illustrates. We say a query is 
connected if it has no disjoint subquery, one - free 
if it has no one-variable subquery, and irreducible 
if it has no one-overlapping-variable subque.ry» An 
irreducible query is obviously both connected and 
one-free. . 

Broadly speaking, we will always break up a 
query into irreducible components before tuple- 
substitution* In effect, we will always prefer not 
to tuple-substitute if it can be avoided or post- 
poned. Althouf^h examples can be constructed to show 
that such a choice is not always optimal, in general 
this is not a bad heuristic. Detaching subqueries 
involves an additive growth in complexity, while 
tuple-substitution incurs a multiplicative growth. 
Our decomposition algorithm is recursively applied 
to aJLl the subqueries which are generated. 

/ The Decomposition Algorithm consists of four 
sub-aflgorlthns: Reduction . Subquery Sequencin^^f , 
Tupl(e Substitution and Variable Selection and makes 
use of the One- Variable Processor of the system. 
Tne interaction among these component processes ^is 
indicated in Figure 3.1 below 
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Reduction 



Subquery 
Sequence 



Tuple 
Substitution 



Variable 
Selection 



One- 
"Variable 
Processor 



call 

_ return ^ 
Fi.^ure ^> 1 Flow of Control in Decomposition 



The fact that tne decomposition algorithn is recur- 
sive is made clear by the existence of a sequence of 
cdllin<^-pattis ( Reduction-Subquery Secuencihp^-Tuple 
Substitution- Reduction) which form a cycle . The 
basic functions of ^ the sub-alrrorithms are as fol- 
lows : 

(a) J\educt ion breaks up the query into irreducible 
components and puts them in a certain sequential 

(b) Subquery Sequendinr: uses the. result of Reduction 
and r^enerates in succession subqueries each of which 
contains a" sinr^le irreducible component tor^ether 
with on^e-variable clauses. As each subquery is ^^en- 
erated it is passed to Tuple-Substitution, and the 
(veneration of the next subquery' av/aits return of the 



A 
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f'esult. 



(c) Tuple SubstltutloR manages the process of sub- 
* stituting tuple values. It calls Variable-*- Select ion 

to select ^ single ^variable for subst it,ution .• After 
substituting eacn^lTuple for that variable, it passes 
tne resulting reduced query to Reduction and awaits 
"tne return before substituting the next value. 

(d) Variable Selection is where nost of the optimi- 
k zation^ takes place. It estimat^ the relative .cos^t 

of substituting for eafch variable^ and chooses the 
variable with the mininura estimated co3t. ^ In so 60^ 
^king , it may have 'to preprocess some one-variable 
subqueries. 

The details of the sub-algorithms vt^Sl be described 
in the next few sections. 
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^. Beducjbdon AjLgorlthro 



The ifiput -consists of a multlVarlable query 
Qj and the output consists of the irreducible .com- 
ponents or <J arranged in an apoi^opriate sequential 
order. This sequence is Apassed to Subouery 
Sequencing , and the result rela\ion^_for 0 is rp^ 
turned. The J)asic steps of t^/dlgor^thm are illus- 
trated below. 



> 


[ X 


Separate ] 
into disjoint 
components 





no 



yes 



separate into 

irreducible 

components 



Output 
Sequence 



to Subquery- 
.Sequencing 



Figur)5 4. 1 Reduction Algorithm 



Let X = (X^, X^,.,.,X ) denote the variables 
of Q and let T(X) aftd B(X)?denote its target list 
and qualification respectively. We assume that B(X) 
is expressed in conjunctive normal form 

B(X) r A C.(X) 
/ 1 ^ 

where each clause C.(X) contains only disjunctions. 
Now consider a binary (0 or 1) matrix with p+1 rows 
corresponding to T(X) and the p clauses, and with n 
columns corresponding to the variables X.,..,,X , 
An entry of ) will denote the presence of a variabJe 
in a clause (or target list), and 0 will denote its 
absence. We shall call this the incidence matrix . 
For example 3.1 this matrix- is given by 



\ 
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, ^ 


S " 


p 


Y 


T: S.Sname 


1 . 

1 


0 


0 


C1 : S.Citvr^ New. York' 


1 

• 


0 


0 


C2 : P . Pnames ^ Bolt ' "V 

X 


0 


1 

1 


0 


C3: P.Si2e=20 




1 


0 


C4: Y.S#=S.S# 


1 ^ 


^ 0 


1 


C5: Y.P#sP.P# 


0 


1 


- 1 


^ C6: Y.Quantity> 200 


0 




1 



We note that in Figure ^.1 there /are two 
steps for which detailed algorithms reinain to be 
provided. First, we need a test for connectedness, 
and to separate Q into disjoint components if it is 
not connected. Second,^ we need an. algorithm to 
separate a connected * query into irreducible com- 
ponents and to put them T in a suitable sequential 
order. 
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a) Connectivity Algorithm 
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connected 



not connected 













\ 








form the logical or 
of aril rows with 1 
In column 1 








f 




of the rows with 1 In 
column 1, replace the first 
4)y the logical or and delete 
the rest 






/ 





Figure 



Connectivity 



If the connectivity alf^orlthm results In a 
matrix with a single row which is not all 1 'sTth^eri 
the variables corresponding to the zero-entries^ are ^ 
superfluous and can be eliminated* If ^ the. final ma- 
trix has more than one row, then the sets " of vari- 
ables corresponding to different rows must be dls- . 
joint. If we keep track of the original rows whi'ch 
are conbined to make up each of the rows of the fl- 
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nal matrix, then the connected components of the 
query can be separated ♦ 

Consider example 3«1, modified by the dele- 
tion of The incidence matrix now has the form 







r 


V 

I 


T 


1 
1 


n 




CI 


1 


0 

J* 


0 ^ 


C2 


0 


1 


0 


C3 


0 


1 


0 


C5 


0 


1 


1 


C6 


0 


0 


1 



Applying the connectivity algorithm, we get 
successively 





S 


p 


Y 


T,C1 


, 1 


0 


0 


C2 


0 


1 


0 


C3 




1 


0 


C5 


< 0 


1 


1 


C6 


0 


0 


1 





S 


P 


Y 


T,C1 


1 - 


0 


0 , 


C2,C3,C5 


0 


1 


1 


C6 


0 , 


' 0 


1 



( 
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S 


P 


Y 


¥ T,C1 


1 


0 


0 


C2,C3,C5,C6 


0 


, 1 


1 



Hence, the query is not connected and the connected 
components are (T,C1) and (C2,C3,C5 ,C6) . 

(b) Fteduction into Irreducible Components 

Let Q be a connected multivariable query. 
Ue observe that ik is reducible if the elimination 
of any one variabl^results in Q being disconnected. 
Let a variable with this property be called a 
.joining-variable . Thus, Q is irreducible if and 
only if none of its variables is a .1oinl?R^ - variable . 
Joining-variables have some important properties 
which greatly facilitate the reduction algorithm, 
and these are summarized as follows: 

Proposition ^. 1 Suppose that X is joining- 
variable of Q such that its removal disconnects Q 
into k connected components. Then any joining- 
variable of one of the components is a joining- 
variable of Q, and every joining-variable of Q is a 
joining-variable of one of the components. Further, 
successive elinination of two joiijing variables in 
either order results in reducing Q to the same dis- 
joint"^ components • 

proof: Each joining-variable joins ,a number of 
componiiiits which can pverlap only on the joining-^ 
variable. Let X be a joining-variable of Q which 
Joins components" , Qp,..«)Q^« Let Y be a joining 
variable of one of £hese components , say Q^. Then, 
Y joins components Q^-i> Qi2^****^li ^1' only one 
of whidh can contain X; say (J^.. Therefore, 
(Q-ip> • • * >Q-i J overlaps the remainder of Q only on Y 
and Y is a joining-variable of Q. Conversely, let^Y 
be a joining-variable of Q, and join components Q^'i 
Q ',...,Q.'. Only one of the set {Q/> Qp'^-'-iQ^'^ 
can contain X, say Q and only one of tne set {0., 
Qo>«**>Ci,J cdrltain Y, say Q^. / Then 



{Q^',...,Q '} and;^{Q^^ . . . ,Q. } must be disJoir/t since 
each , i^>. 2, can overlap its remainder ir/ Q only 
on X and. none of {Qp%...,Q/} contalris X^* Henc€ 
C^',*..,Q/ are subsets of Q. ^joined to it . pnly ,t 
YT so tHat Y is a joining-variable of Q.. It j 



Hence., 
,by 

. . is 

clear that Q nas components {Qp, Qo,--->Cr} each 
joined by only X, {Q^^ ' , • • - fQj/ } ^eacn jBined by 
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only Y<, and a ccnpcnent 
Llinlnaticn of X and^Y'^' 
disjoint 'ccnponents {^o>/ ^^o) 
0 } Where 0. denotes iatn ^ 
uitln Y rencved and 

and 



:d Y rencvjid / 



0^ 

X' 



joia.^d by both X and Y. 
either crder results Vin 

o or' n ' . 

^ \ * * * J ft^} ^ o- i ' 

removed , *'0 , denotes 
denotes C v/ith both X 



xy 



TJf^/substance of /Propo^sition ^, 
treated .b^rifT;ure ^.3*. ' 



1 is 111 us-.. 




Fi<Ture i Joininf^>- Variables 



The results of Proposition ^.1 mean that we> 
can rirJA the irreducible components of 0 by succes- 
sively cnert5T<irir; each Variable for the possibility of 
being aMjcining variJa.ble. Each variable only needs 
to be ex2ihined once, pnd the order they are tested 
is immaterial. Further, since a variable is joining 
if and only if its ellimination disconnects Q, we can 
use the connectivity algorithm for the test. 



Take the incidenbe. matrix of 0 and eliminate 
'from it all rows witH only a sinjjle ^ .Beginning 

.^with the first, eliminate each column in turn and 
vtest for connectedr4ess. Suppose that when column n 
lis eliminated Q breaks, up Into k. connected com- 
ponents with n., I'^p^ * ' * '^^k ^^^i^l^i^s' respectively . 
'Then, these correspond to components of 0 with n.+l, 
np+1 , . . • ,n. +1 .variables respectively, any pair of 
vmich oyerxap only op X^. We can now ; proceed to 
test columns, m+1,...,^. We note that each of the 
variables ^m. ij • • • occur ip only one, of the^ com- 
ppnents so' that arter \.the^ nth column (i.fe.V-Ahe 
first joining-variable) the tests are perforn\ed 'on 
matriqes of reduced size. 



. . Each irreducible component of Q corresponda 
to one or more row of the irioidence 'matrix , andean 
be nepresented by the "io;gical or" of ' the 
corresponding rows. .Hence^^Vcan be represented in 
terms of its irreducible' .compcments by a matrix with 
variabl es ,as columns and cow^ponents ,as rows . We 
shall call this the . reduced-incmence^matrix . It is 



convenient to arrange the rows 



follows: 




(1) , One-variable rows except 'the target list./ 

(2) Components whlph are one-overlapping after 
deletion of one-variable clauses and wnich do 
not' contain tne target list* These should be 
grouped according to the joining variable/ 

(3) Other components which do not contain the 
• target list* . 

(4) ^ The Component which contains ' tfie target 
list* , . . ' ^ , 

For example 3,1 the resulting reduced incidence ma- 
trix is given by: ^ 





S 


P 


Y 


CI 


1 


0 


0 


C2 


0 


1 


0 




0 


■ 1 


0 


C6 


0 • 


0 


1 


C5 


0. 


.1 


1 


',C4 


■ 1 


0 


.1 
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- In order to use (7*5). iti (7vff)' we "^raust know 
the number of pages occupied by the .piLnge relations 
for every in the sequence Sj:* Wer^tSo^ that'^'^.' is 

,,a sequence , and not a set|,;so that' the: range relation 
of a- query hay invbrve'^ 'the r^$iil€^ ' relations of 
queries vhich precede it* Theriefore^ Knowing the 
sizes of - the range' rel^ions of is not sufficient, 
to determine (7*5) for the q^^s. Since we don't. 

'want to execute the sequence except for -the op- 
,tiinai i, we must rely\on a^ procedure to estimat^e 'the 
sizes and other parameters' of %he result relation 
.for a query. ' ' / * \ ' 



^ Consider a query Q\ with / rantje relations 
^••.^.,R i,' a target list T(X) and*a qualification 
Let rhe domains. of" R^. .be denoted by ' D.., 
j=1,2,*.*,d,. Each is by definition a subset if 



BCX>? 



Hence, the prJductTTR^ is a subset of 



(8-1) 



D 



B(^X)=:1 

but. to 



.To determine what subset, of TTR^ -satisfies 
requires accesses to the actual relations ^ 
xietermine what subset' of D satisfies D(X) = 'l requires 
only Knowing the ^domaihs ^I^i^K The' storage re- 
quired tp represent {D. .1 is, in^ general /far* less 
thar* that' required for '^R^}. 



^ . \ Let R(9) denote the result relation - ^of 0* 
We can estimate the cardrnality of R(Q) as''. 

(8*2) ^ |R(Q),I = J TT i.l • {fraction of D satisfying B(X):s1} 

: ' i<n ' . ' 



The domains of R(Q') can be estimated by evaluating 
'T(X) pn the subset of D which satisfied B(X)=:T* 
That is, the kth domain of 'R(Q) is .estimated to be 



(8.3). 



{Tj^(X) 



X6 D, BCX) r 1} . 



In. most cases D^... ha^ sufficient. reg\ilarity 
,tb permit it to be reprgsentjed by just a few parame,- 
ters* For example, . niight te simply all iht(?gers 
betv/een a and b/ Tha5,^the storage requirement for 
fkeeging track of^the^ domains for the resialt. rela- ^ 
t:jfon^ of the, sequence S^^j/can be expected to be rea- 



sonable 
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6 • • . Tuple Substitution : * . \ . . . ^ 

The input to tuple substitution ±s a query Q 
consisting, of a single irreducible component In^K 
variables Xp,***,X , zero or more one-yariable 7 
clauses in eacn of thB variables, and the range re-- ' 
latlons"^ "R. , R^^ ^^ ^/^n t¥e varlableVr ir^ refuf ns T'^ 
the resulli: refatlon^ro the calling process ^ \, 



The first thing that TuDle Substitution does . 
*^>^is to. call Variable Selfetion which takes Q' and the- 
r-ange relations and chooses a variable to be substi- 
, tut^ for* In' order to laake this choice it , may have 
> .vto^ process some .qr , all of the one-variable clauses - 
A^o .restrict^ tnd^^ ranges. Thus, in general, it: re- 
turns Q% R- Ro.'f^il^r ^^'^ variable to be 
/substituted, ror (Say X-^)* For each -tuple . in R 
Q' becomes a (n-1 )-cvariable . query Q'C^ ) In rfihe 
variables X2,***,X For each ioL Q'Cc< j 1b i. 

-passed to Reducrlbn which returns^^the result^., .The , 
results- . to Q {^\) for all ^ .in R / are^^accurriulated 
, arid returned to the calling probesS. . . ^ ^ 




4f 
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7. Variable Selection ^ 

. • ' ^ V. 
This, is the heart of optirnizatiorx. The in-^x ^ 
put is a nultivariable query v/hich is irreducible ' . * ^ 
except for one-variable clauses • As Its nane su^^- 
fje3ts, the task of* this portion of the deccmpositior. 
alf^crithia is to select a variable for substitution, I 
althcu^n fc do so i^ 'nay also have to process some 
of the cne-variabl.e clauses* '\ 



' ^ CcnsideV a* query .0 v/ith variables. . 
X^,...,X and ♦ranr^es Rp^'^'^'^r* Supnose t^at X, 
iS substituted tuple-bj^-tuple* For^ each tuple, ^ n ^ 
.becomes an (n-1 )~variable <iuery Q^(c<). It is like- 
ly that fK(<?<) takes the sane ^anount of time to pro- 
cess for every -c< , and in nost instances every <?< in ' 
^ F.^ has to be used. Henae, ^ ^ . , « 

(7;1) . Cost of ' processinfT 0 if is substituted 

= (cardinality of li.^} x Cost of proc^^sin^ 0^ 

The first thou^jht , therefore, is to cnoose ^ X. v;ith 
the smallest ranj3^e» floweverL this is not optimal 
' for several reasons. |* 

/ ' First, it may be possilji^^ to reduce some cr. 

' all of the relations R^, Hp,**.,!ft , by oreprocessinT 
\ ; one variable clauses. Should thi..4 be done for al,3 , 
Vfor*" sone, or for none of the va\;^iables? If all- pf 
Cj\\.ne 's dan ' be reduced, this deci's^idn , alone ir^- 
^^^Volvei 2^^ choices • The situation -is further cori^- 
pllcated by ^ the fact, that for a ;;iven" variable the 
* decision ad. to v/hether to prepropess: the one-^ 
.variable clauses depends bn v;hether tne variable is 
ciiosen for substitution. If it will .be chosen for 
'substitution then Its ranf^e should be reduced as. 
rnuch as pc\ssible. If not, preprocessij,nf may, be a 
was?te of tine.. . Of; the other hand which va\riahle 
Should be chosen depends not so nuch on R.. as .on the 
reduced R. Let .0(X|) denote the c^e-variai5l.e 
subquery cf^. 0 In X.*, "and'let B. ' be the "reduced 
ranf;e after 0(X.) is processed.^ The /qllcwlnf^ poli-^ ^ . 
* dies seetrt to be* reasonable alternative^: 

• ^ ' . ■ ■ ' ■ : :■■ 

(a) Preprocess , every QCX.),^ basin/^ the 'pollcy 
on the art^ument that f:he> cost of processing .:V' 
one-variable queries, is relatively sn«3ll .and it 
; is inp&rtant to dhoo^e the^ variable' for substi- ' ' ? ' 
Vution well. / ' _ / ^ * ' , , ^ 

(h) ^ On the basis of OCX*),' a^'decislcn is nade * . /. 

for e,ac.h variable whether to prepr<^^cess br not. 
.Variable sele'ction takes place after ^prepro^ '/ 



cessing. 



The version of INGRES completed in January, 
1976, opts for policy (a)* In part, it is because 
in this version the varjkble selection is then based 
solely cn the cardinalities of the reduced ranges 
and no other information* It is important, - there-, 
fore, for these cardinalities to be accurate. 



For ( 
estimate tHe 
only if X. is 
tion. . For 
contenders fo 
variable for 
than min I R • 1 
cept for ve 
able selected 
preprocessed , 
before substi 



b) a workable policy is to uae 0(X>) to 
size of^R. ' for each i, . and preprocess 
likely 'r'o be a contender , for selec- 
example, we mlRh,t choose ,the top three 
r preprocessing, oP^ preprocess every 
which the- estimated size of R. ' is less 
One good fea'ture of (b) is that ex- 
ry unusual situations, the actual vai^i- 
will be anioR^ those . which have been 
and no further processin^5 is. necessary 
tution. 



A second and more Important objection to the 
strategy ^oT choosing X. With the smallest rahf^e is 
that the complexity of ca^n vary - greatly wittt i 
and this must be taken into^account in any strategy 
which lays claim to being even^ near-optimal. What 
must be determined is the extent to w,hich 0 can> be 
reduced as a consequence of spbstituting for X^.- 

Assume that we choose either (a);^ or (b) for 
the policy on preprocessing one-variabl'e\clauses so 
that that decision is decoupled from the ' selectiop 
of variable. Ue caTT^a^sSTime^^Jthat the query\at this 
poiht consists of a ging^ie irr^etlucible coj^hponent 
v/ith sor\e one- variable clauses. The crux of the 
•matter isXhow the irreducible component is affected 
by^\tT^^ Assume that v/hatever prepro- 

cessii<K is to be done has been^done. Let the query 
dei^ed by Q. Let X^,Xp,.*^,X , be the vari- > 
ables, and^let R^ , :u , ,R * bg their ranges/. Let 
Q. ( ) denote ttte. ^^esulting query from , substituting; ^ 
A for X^, in Q. Let 6CQ) denote the minimum co^sjb of 
processing Q.. Then 



(7.2) 



C(Q) r 




^(Qi(^)) } 



Where R. denotes ..the ^set .of tuples-values whicb-^^have 
to bQ substituted for X 
simply R-, although as tie 
aro* exceptions. 



In most instances this is 
indicated earlier there 



Equation (7.2) .is a^ dynamic programming 
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equation for the optimization problem at hand.^ As 
it star,ds, it is not too useful,, since ho^f'^'CCQ) 
depends on Q is not known^ /However , .(7*2X> is a 
suitable- starting point for optimization* The .vari- 
able selected wiJLl correspond ,tc? -the value of i 
v/hich minimizes an estimated value for 



(7,3) \ . . X C(Q^(o<; )) 



Although vre have, in ,e 
-tion pro'blera to one 
is amenable to a vari 
Consider some of thes 

>i) Suppose 
CCQ. (c< )) to be inde 
minimum Cj ccrnespond 
somev/hat simplistic ' 
mented in the version 
January, 197'6. 



ffect transferred, the. op.timi^a- 
df estimating . cost J the latter^ 
ety \of' heuristic apppoacheFs(}"' 

e': ■ ^^ , ^ ' 



we, take the estimate . of 



pendent oT H and 1* Then, the 
s to the smallest . This' 

policy^ is vfhat has been iraple- 
of- INGRES operational^ as of 




(ii) , We observe 'that unlike Q,^^;i^C:( 



not irreducible* 



One/ Should therefore 'call 



Reduction-Subquery-Sequencing to reduce Q . ( )^ to, .a 
sequence . of subqueries'^ 6ach of whip h is irredu- 
cible*except for one^-i^ariable clauses. Nov,. << 
.enters the subqueries only as a parameter, and the 
sequence S.. is really independent of t< . Thus, we 
have ^ ; , . r' . ^ 



r 



(7.4) C(.Q.(o<))- Zl;c(q^ )'. 

■ ■■, . .VS^ , . ... . , 

since the' structure or.-Q.( o( ) has npw , been 
represented, we can accept a relatively crude , est i- 
aate for C(q^ ). For "exampile, ,v/e might, take .the" esti- 
mate of C(q-^ ) to be' ' ' . , 



(7.5) 



P(RJ 



where R. ^re the ranges ^jof q ,and":P(R) 'is ,the number 
of pages pccjipied by '\ " 



(iii) We raigh't try to obtain \anV estimate 
for iqci^t by sam-plirlg. Consider the e^ju^tion ob- 
tained- from. Using- (Ys/i) in- (7.^) - ^ ^' ■ 

SI, C(q. )) > 

This is truly recursive, *since Q' and q^ arei^ili^rle! 
pf th^ same restricted type (viz,. irreduc,ibl#^ except 



(7.6) C(0) = niin 

-.1 




^2^ 



for one-variable clauses) • If tne .number of / varl-r 
ables in :Q is najb enormous (in practice, very feW 
queries contain more than '4 or 5 variable^) 



might try to pus*» the recursion (7^6)- all 



one 
he way^ 

down to one-variable queries, but using small sam- 
ples for the range relations of Q*. It ls*ver'^ like- 
ly iihat tne costs of different path^* in tfte' decisiorv 
tree vary widely, and only few are contenders fo|» 
the^ optimal; path • V/ith efficient management, ' thi'^ 
V approach need not be prohibl'ta^^ 

\:, # ' These ane but three posS^ibie *approac!{ies to 
estimating ^ C(.Q) • Other approaches inMudfrig some 
variants , and boni'binations of the§e\are under ;con-. 
;sidera*fciori * ' We expect to - iraplem^^xt; at least' the 
'three outlined above forA^xp.eriraentai evaluation*.. 
Indeed, (i) has been implemented,^ -and ^id)' 4s iri the 
process of be iri^ 4^1 pigmented • 



1 K 



1^ 
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8* ?lS^lmate . of Result Parameters^.. 



- > In order to use ( 7*5). in* C7#,^lr. we "^niust know 
. / the number .of pages occupied by t,hfe\r^,ang relations 

' for every in the sequence Si. We;^40te thaf^'^.' is 
* - - _..a sequence and not a set,, 'so that- the: .range relation 
, _ \ of a- "query -nay ihvolve^^^ the r^^^w3£€ >el^^ . 

" ^ queries v^hich precede it. Therefore, Knowing the 

\^ , ^ ' sizes of: the range relations of is not sufficient . 
; ' / ' to determihe ,(7*5) for the q^^s* Since, we don't. r 

^ want to execute the sequence S. except for -the opr 

. / . .timal i, we must rely\on a procedure to estimat^e 'the 

sizes and other parameters of 'the result relation 
\ ! ' V . .'for a query* ^ . . / - ' ^ 

' • . Consider a query Q. with ' ranrje relations, 

,R:j,Rp, ••;,.,,R V a target list T(X) and'a qualification 
B(xy. Let the domains of" R. .be denoted by 'D,., 
j = 1 ,2, • . • ,d . • Each R. is by djefinitibn a subset or 

• ' - ^ , D ^ ^ . \ ' . : • • 

. > _ ^ Hence, the product, TT R^ is a subset of / ^ 

- / ' (8-1) D ^ TF'^ Tf^- ' D". 



V 



i<n ' j <.d^ 



ij' 



.To determine what subset, of TTRj -satisfies B(^X)r1 . , 
requires accesses to the actiial relations but , to 
^determine what subset' of D satisfies BCX)=:1 requires 
only Knowing, the domains (J^t^}* The* storage re- ^ 
quired tp represent {D. .i is, in^ general /far- less 
than that' required for^R^}. 

\ Let R(9) denote the result relation - ^of 0. 
We can. estimate the cardTnality of R(Q) as* 

(8V2) . |R(Q)j = J TT i.j • {fraction of D satisfying B(X):i1} 

• ' ^ ^ i<n ' , ^ ' . ^ ' 



The <lomain& of R(Q) can be estimated by evaluating 
; 'T(X) on the subset of D which satisfied B(X) = U 
That is, the' kth domain of 'R(Q) is .estimated to be 

" (8'*3). V , {Tj^'(X) ; X^ D, B{X) = 1} . ' / 

In most cases D..,. ha« suf flcierjt. reg\ilarity 
..fcb permit it feo be rep^gierit^ed by jusV a few parame- 
ters. For example, D. . might be simply all ir>t<5gers 
between a and b.' Thus, ^ the storage requirement for 
<^ Tkeeging track df*-the^ domains for the result, rela- 
^ - t^^on>\ of the, sequence S£j;can be expected to be rea- 
sonable. • r ■ . . 



vERJC 




. ^ Since the sizes of the tupies are always 
known, the number, of pages required for each of the 
• . result relations for the sequence can . be ' computed 
- from the estimated C8*2), which in turn * fs' computed 
from the estimated domains, using « (8. 3),^ 
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* 9* Summary ^ • . ; V . * - " 

\ In this, paper ve have presented a. detailed 
apoount of how multivariable queries are decomposed 
in system INGRES* . The basic / Ingredients of,: the K^ 

(a) To discover, pieces* of a query which tlatre' 
joined* to the remainder by a single jjpining- 

" : valriable; . *; ; : - , - 

(b) To substitute for a variable. 4.; 

The overall, strategy is to break up a query > at the 
joining-variables whenever, this is poisslble^,; and to' 
select a variable for substitation , which .incurs , a: ^ 
"minimum cost" whenever substitution can no longer 
be po5tponed. A detailed algorithm for reducing a ^ 
query_ into irreducible 'components has been given; 
Alternative approaches to estimating costs have also 
been discussed* \ ^ • 

- - ' - - / - ' -- ' " 1 ^ ' » _ ' ^ 

: ; Optimization itself incurs a cost which has 
not been taken into consideration. For simple 
^ueri€s, .e;Laborate optimization may well do more- * 
:^arm than good* The approach to resolving this dif- 
ficulty that we have: opted, is one suggested by M'*R. 
St^nebraker.. . Suppose tf|plt we have two. or more stra- ^ 
tegles stQ^st^,... ,st each one ^being better than 
the , previous one but also requiring a greater over- 
head*/ .Suppose we begin a query on st^^ arid rjdn it 
for an amount of time equal to a fraction. of the es- 
timat^d overhead of st..* At the ehd^ of that time, 
by simply counting the number, 6C tuples of the first 
substitution variables which have already been pro- 
cessed, we can get an estimate for the total. pro- 
^ cessirig time using ^t^. If this is significantly 
greater than the overhead of stj., then we switbh to 
st..^ Ott^erwise We stay/afid complete processing the 
query using stj.. . Obviously , "the" procedure can be^ 
repeated on st-, to call stp if necessary, aftcl so 
fcrth*^^ st may correspond, for example , to ,„progres- 
siVeJ^y"^ more levels in the decision, tree, or to ; pro- 
gress^ively mor*e elabbrate estimates, of result param- . ] 
eters,; or better, sampling. . ^ : " ' . ' 

. ,^B^have not addressed the question of optim- 
izing the, processing of one-variable ^queries/ 'Some 
optimization is currently being done in INGRES, and 
this is described elsewhere ^[Si;ON76]i . I L \ ' . 

/ '1 ♦ the, appendix we have, given a brief 
: .description of how INGRES is implemented. The orir 
ginal design o^ the imp3,ementation^ was primarily^ the / 



v/orfc'bf M^R* S'tonebralcer and G^D. ilelcl,* . Redeai/jn of 
.process' s^ 9ind in particxilar the desigra of the >query 
tree and the irapleraentation of tire decoraposltiorr.al- 
gorithn in t rie current version (as o f Janua ry y 
Have been largely the work of Peter Icreps* i/e . have 
alsc^ includ^^^^ in the ^apperidix,^ specificate ,oX fehe 
principal .data structures needed for ' our _ decoraposip- 
'.tion« algorithm ^ 

One of us'(E/i/'0 is responsible .for- intro* 
duclng the conceptual framework in which the decom- 
position algorithm . rests, . ' viz* •-the-:': polit^y : of 
transforining a .raultivariable query to. one dimen^ibnr. 
al onejs, and the strategy of. alternating between- 
reduction and tuple substitution > We have collarf 
berated on the reduction ^ algorithm, 'and on the 
neuristics' for variable selection* ' The implementa- 
tion of the full algorithm as well as monitoring: 
subsystems for the performance evaluation^ is being! 
designed and executed by K*A.Y. ^he decomposition 
aifioritiim, being at the heart of lllGRES, has enjoyed 
the attention of many participants of the project • 
It . is difficult to remember, vho suggested what, but 
the three, afor^emention.ed colljpagues* have all made 
important contribytlTJKB^ In particular, as in every 
aspect of INGRES/ the intSLijence of M»R»S* is discer- 
nible througJIout ;6ur algbt^lETlmv^ 
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APPENDIX A. ■ . \ 

System Organization \ 

,f INGRES, Int.eractiye Graphics and RetnievalV; 
"System7~runs on^ a PDP t1/45 xmder the~^ UNrx-^operarting^^ 
system[RlTC73]* The entire system is writtea in the ^ 
programming' language '•"C" CRITC?^]^^ : It'^ rhas ' four ma- , , 
Jor components which are organized as. shown below* 
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user ~ , y 


parser 




decora- 
position 




njtilitles 


interface ^ 


\ ■ > 
<— — ^ 


— > 
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These four components are set up as processes under 
UNIX and communicate through the us^'of pipes-" The 
us^r interface can be one cff sever^J^'f orms : an in- 
teractive te>tt -editor, a graphics interface 
[MCnO'T^ J I an interactive English-like . language 
[COI1D74], or part of a host programming language 
fALLM75]* The parser accepts the user's query and 
processes it into a tree in conjunctive normal form. 
Thisi query tr^ee and a table of relations declared^in 
the * RANGE statements are passed to decomposition* 
The "decomposition, 'process contains . not only . the 
decomposition algorithm but also the one-variable. / 
query processor. The utilities process contains/, 
many functions which can be used by the system or 
the user. , ■ 



* ' ' Data Structur*ea ;\ - ^ i.:;' 

^ There are three main data structures , wliich\ • ^ 

' - S6m.e of- the informatipri for , this structure : . , ^ 

is . gather ecL durjtng parsing aficflpassed .ta deconposi- ' 

tlcn as an crdere.d.tnatrix. It is then put 7; into : a , > 1,-1:* 
matrix^eafeh, entry of has the fallowing 'for^^ . , /- ''K^ 

- , sferuct rangev - ^ X / " 

f : .9har . relid XMAXJUME|;J , , , 

struct descrxptor Mesp> ' , : / V . : L • - . . 

The parser sen<ls\a table of relation , names vWtilcK^ ^ ^ ; 4^ 

have, been diBclared In RANGE statemeri^ts; ,the /pr<l.er\of \ \ ; 

.l^hese riames indicate the variable _ associated ^ with _ ' . ; r^; '-^ 

each^- - These- , are /relid*. The' secorid entry is a^ - \ ;V >; 

pointer to. an in-^cpre copy of the. System: .catalpgue . ^. > ; 

description for that relation The third lehtry is A • ^ 5 

flag which 1$ set when the, cprresponding vai;;J.abie^^ ^ V 

has been .sele;6ted^ for substititioh. ' :¥/ ' . J: ' >V;:;^ 

The use of this table will aid decomposition in .the , ' , ^ s 

use 6t ^temporary relations*. When a nev/. range is/ V- ' . 
created for a" , variable^ by , execution - of ; a px^e-^z \ ^ \. ./ 

variable ^ query, the entry in'the range t^ble for' . ! a 

that entry is.th? sane^ except for the pointer td the -„ ' ^ ''-^ 

e5atalof;ue .degcriptlon/ The reliia is always the orl- . - ' 

ffinal relation name - for that , variable .and ^ the^ 

descriptor^' Is^ for , ^ the current subrelatlon it'ls" v ' r 

ranj^ing ever •.../In this way, if a temporary relation f 
must be created several times during the process of , 

substitutior^j the same temporary relation.\;name ; andj^, \ . " 

descriptor _cah be reused by simply ;deletin(x the oldP ^ 
..t,ppi,es; from the previous iteration. This. saves much 
•overhoa(^ ih^ the creation of temporary/relations. 

Incidence MatrlX^^^^^ _ * ' / " ' ^ M 

This is^ a binary matrix of \ clauses ' Cor 
; supqueries) / ys* : , variables which is used within . 
decom'positigi^ to represent the current , quer^**? under • ' , . 
consideratijon., ; . It is u^ed during, reduction to * 
d^termir^e all irreducible subqueries and can be used ^ 
.during selection /to represent ./the cpmponent 
, ^ubquerie^ in, a ^compact JTorm* / This, matrix will also' v 
contain an, entry for each^ clause which points :66 the . 
actual clause so that it may be easily obtained, ^^hen ] 
it is necessary to< build a' query tree for execution \ 
of a s.ubqueryv \ - , ' ' • ■ • ^ . 



Query 'Tree-: * r ... " , ' / 

' _ \ s Tfte parser sends a ^list^ rep^^ 
quiry, tree to decomposition .whAchQthen rebuilds ^the 
query, tree- adding useful. iafprmatiOA as .it is rebog- 
■^ni'zetl * The ~getieraT form^oT'^ a;robt Tnqde 

^ yiit h t jj e , t a rge t ^ . 1 i s t _o.f. the query ^as. , the^left ^br.aiib h . 
and the qualification as the right branch. Since, 
the que^y is «dn conjufictive, normal form,, .all the in- 
termediate- modes^ along the right 'side -will be AND 
(conjuncti&h) nodes™"^ -^'^ -^^^^ - " , • 




tl • 
vElementl 




disjunctive 
.clau.se 



AND 



^ disjuVctive 

clause END 

More specifically, .nodes of. the. tree are defined as: 



/Struct querytree 
{ -struct querytree .*left, '*right; ^ 
, struct, symbol sym; 

■ V • } ;■ , ■ • _ .■ . 

Where left and right are the poirri>ers to the respec- 
tive branches r The second entry^'deH^hes the struc- 
ture within the node and this varies depending on 
^the^ype-'df node; r * \ \ - ' • 

For .nodes representing arithmetic joperators, dis- 
junctions .('OR), • result .domains and constants, trhe 
structure is: , ^ ^^^^^^^ .rx - \ . ^ , 



. * ' 'struct symbol ' /- J . , 

<^ { char type ; ^ . ^V * _ . * . • ' 

\ ' ' .'\ char len;^^ " 5 ' ' > ^ * . 

j ; Int .yalueC] ; - • ,\ ^ 

where type is a pode- r.epreseriting the ^ type, of the 4^ 
,node (i»e*, plus, iqihus, OR, etqO and -len is the ' 
length, in bytes ^of value. value is ^ a variably 
length field . (0-255 bytes) and gpntains the ap- 
propriate value for that type .of node Fpr exiainple, \ 
if the node is 'representing; a constant then the . 
valufe contains the actual. consta)it\ * . 
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For .nodes representin/t variable.attr^lbute (i.l^-^Y::' 
E^SALATO the .structure is: ; ^ . ' - VVfr 

" '\ - ""'^ ^ - \ — ' - : d • ' ■ V -^-^V-' 

struct symjbol ,^ . . /I^'h. 

^ , {' ^' ; char type-;-; ' . .-^'t^ " '"^V'- /'v"^:' "'^^ 

- - ' char, vanno^jatth^; \ V. ./ V • - 

* char frmt , frmlj v/:. - ./ . ' j_ . ' ^ ^ 
char ^vaXptrjV/ ^ U ' ;^ -^r '^r'> 'i/'l 

/ v.; };.-;:-'.;; :7 ^ - . : V;' -''V ' 

where type is the same as above aircl len ; is . Ylxed*.^^^^ . 
varno Is an. index Into the range tabl^fcr the^^^^ , 
cprrept variable; attno is the domain nunrlS^r Cfrpm'^'^; T., 
the /system catalogue) of the correct domain, refer- V'C.^ '* 
ehced*. frmt and frml give the^orraat of the " attri:V /\\ 
bute '{i^e.y 12, etcO» This is used to/deter- ' 

niine neij/:; domain types and for calculations.^ -The*/ ,1. 
last entry is used during tuple substitution; 4 'If ,a.\ ' '3s 
particular variable is . selected , fdr substitution^^ *v ,\ 
all variable .attribute nod.es involving that variable \ 
will become nodes representing constants.,. But. the . 
tree .itself need not be changed ♦ This, ; fielcl , ^ 
valptr, is simply set to point to the constant "value- ' 
that should be used* This position remains, fi.xed so 
when a' new tuple value is substituted, the ppinter ~ ' 
dees "not change, only the value it is pointing*. to • 
changes. In this way, a newJtree is not needed f or ^ 
each level of substitution or for each iteration of ^ 
substitution .values. If the pointer i:s..^?,qra, the * \^ 
variable information* is used; if it is nonzero, it 
is a constant, hode. . ^ / ^ 

f*ox nodes representing t^he root conjunctions 
, (AND)> ;the structure is; ^ : 

' struct syjiibal ' ' ./^ - , . ^ 

. { / .,cti£^r "typ^^ ' Jv* : - 

" . /char lenj' ^ *~ ■ - ^^'^^ * - - 

^ j;^'. , '-^^char tvarc j ' /■ /*\ ; ' "'^'-y; >' 

/"^^^ /char lv;arc;-'' /' ' , 

int Ivarm; " , . - / / ; , ^ f ^ , 



;/ } 



^ int rvarm'; 



v/here type ds ^he same and' len Is fixed tvai^c . and^.. 

Ivarc are fjoth cp.unts r of the variables useid,* tyarc. , 
. i^s^the number of variables in /.thfe sub-.tre:e beXow^ 
,^ tpis*" - node /,^nd > ^Ivarc is the number of. variables in 
Hhe. :left branch. So for .the;. root node, /tvarc >is the j 
, t(;^tal number of variables in thertjuery and ivarc; is* 

the number of .variables in the t^Vget list. , For, 5an / 

MiD ,node, t^varp is the number of variables in the; 

remaining, cla:use5 and' Ivarc is the number / of vari-' 
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s in, the. . single, clause of , its Meft branch. 
»jn and rvarra .are bi*/ntaps.;of\the va^riabl^ 'used 
in J^e left and right prancties of ^the node, 'respec- 

o This structure is not. as. as it might appear. ^ 

it. is .true, that during decomposition raany. subqueries 
;a:re created aftd executed ciany times, ' but it /Jshpiiitf 

, be > ^riOted. tfiat all of these sufequeries use clauses . 
v/hich appear ixi the origirxal " query* ,TJhe : target ^ 
lists may, change, but po new clauses are e^er creat- 
ed except through substitution., Since this^ is ^.trxie, 
when a sybquery.is to be exeputed/ a, query tree, can 
be constructed using nodes from the original ; tr^ fee;. 
A . new root node must toe oreated for . each/ su^^ • 
ahd for some^ target * list nodes', but all the ^AND 
nodes can simply be detached from the original qWry 
tree and added to the new query tree?* 
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